Importing the data
As a person of many talents, it’s time to take on a different job: nutrition analysis! Your goal is to analyze the sugar content of a sample of foods from around the world.
A large dataset called food.csv is ready for your use in the working directory. Instead of the usual read.csv(), however, you’re going to use the faster fread() from the data.table package. By default, the data will come in as a data table, but since you’re used to working with data frames, you can get fread() to return one by setting data.table = FALSE.
[Note: In order to make these exercises manageable, we’ve taken a random subset of the original data. The dataset you’ll be working with may not be large enough for fread() to make a huge difference, but be aware that there will be times when read.csv() just won’t cut it.]
# Load data.table
library(data.table)
# Import food.csv as a data frame: food
food <- fread("../xDatasets/food.csv", data.table = FALSE)# View summary of food
sum_food <- as.data.frame(do.call(cbind, lapply(food, summary)))## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 1)
sum_food %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| V1 | code | url | creator | created_t | created_datetime | last_modified_t | last_modified_datetime | product_name | generic_name | quantity | packaging | packaging_tags | brands | brands_tags | categories | categories_tags | categories_en | origins | origins_tags | manufacturing_places | manufacturing_places_tags | labels | labels_tags | labels_en | emb_codes | emb_codes_tags | first_packaging_code_geo | cities | cities_tags | purchase_places | stores | countries | countries_tags | countries_en | ingredients_text | allergens | allergens_en | traces | traces_tags | traces_en | serving_size | no_nutriments | additives_n | additives | additives_tags | additives_en | ingredients_from_palm_oil_n | ingredients_from_palm_oil | ingredients_from_palm_oil_tags | ingredients_that_may_be_from_palm_oil_n | ingredients_that_may_be_from_palm_oil | ingredients_that_may_be_from_palm_oil_tags | nutrition_grade_uk | nutrition_grade_fr | pnns_groups_1 | pnns_groups_2 | states | states_tags | states_en | main_category | main_category_en | image_url | image_small_url | energy_100g | energy_from_fat_100g | fat_100g | saturated_fat_100g | butyric_acid_100g | caproic_acid_100g | caprylic_acid_100g | capric_acid_100g | lauric_acid_100g | myristic_acid_100g | palmitic_acid_100g | stearic_acid_100g | arachidic_acid_100g | behenic_acid_100g | lignoceric_acid_100g | cerotic_acid_100g | montanic_acid_100g | melissic_acid_100g | monounsaturated_fat_100g | polyunsaturated_fat_100g | omega_3_fat_100g | alpha_linolenic_acid_100g | eicosapentaenoic_acid_100g | docosahexaenoic_acid_100g | omega_6_fat_100g | linoleic_acid_100g | arachidonic_acid_100g | gamma_linolenic_acid_100g | dihomo_gamma_linolenic_acid_100g | omega_9_fat_100g | oleic_acid_100g | elaidic_acid_100g | gondoic_acid_100g | mead_acid_100g | erucic_acid_100g | nervonic_acid_100g | trans_fat_100g | cholesterol_100g | carbohydrates_100g | sugars_100g | sucrose_100g | glucose_100g | fructose_100g | lactose_100g | maltose_100g | maltodextrins_100g | starch_100g | polyols_100g | fiber_100g | proteins_100g | casein_100g | serum_proteins_100g | nucleotides_100g | salt_100g | sodium_100g | alcohol_100g | vitamin_a_100g | beta_carotene_100g | vitamin_d_100g | vitamin_e_100g | vitamin_k_100g | vitamin_c_100g | vitamin_b1_100g | vitamin_b2_100g | vitamin_pp_100g | vitamin_b6_100g | vitamin_b9_100g | vitamin_b12_100g | biotin_100g | pantothenic_acid_100g | silica_100g | bicarbonate_100g | potassium_100g | chloride_100g | calcium_100g | phosphorus_100g | iron_100g | magnesium_100g | zinc_100g | copper_100g | manganese_100g | fluoride_100g | selenium_100g | chromium_100g | molybdenum_100g | iodine_100g | caffeine_100g | taurine_100g | ph_100g | fruits_vegetables_nuts_100g | collagen_meat_protein_ratio_100g | cocoa_100g | chlorophyl_100g | carbon_footprint_100g | nutrition_score_fr_100g | nutrition_score_uk_100g | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | 1 | 100030 | 1500 | 1500 | 1332073018 | 1500 | 1340209117 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | logical | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | logical | 1500 | 1500 | 1500 | 1500 | logical | 0 | 1500 | 1500 | 1500 | 0 | logical | 1500 | 0 | logical | 1500 | logical | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0 | 0 | 0 | 0 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0.4 | 0.033 | 0.08 | 0.721 | 1.09 | 0.25 | 0.5 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0 | 0 | 0 | logical | logical | 100 | 0 | logical | logical | 0 | 8.6 | 0 | 0 | 1.1 | logical | logical | 0 | 0 | 0 | 0 | logical | 7.5e-07 | 5e-04 | 5.3e-06 | 0 | 6e-05 | 0.000176 | 0.00059 | 6.6e-05 | 1.13e-05 | 2e-07 | 1.9e-06 | 9e-07 | 0.00082 | 0.00063 | 4e-05 | 3e-04 | 0 | 0.043 | 0 | 5e-05 | 5e-04 | 3.6e-05 | 6.5e-06 | 2.7e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 2 | 12 | 30 | logical | 12 | -12 | -12 |
| 1st Qu. | 375.75 | 124974.5 | character | character | 1393744722 | character | 1424291534.75 | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | 1500 | character | character | character | character | character | character | character | character | 1500 | character | character | character | character | 1500 | 0 | character | character | character | 0 | 1500 | character | 0 | 1500 | character | 1500 | character | character | character | character | character | character | character | character | character | character | 369.75 | 35.975 | 0.9 | 0.2 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 3.87 | 1.6525 | 1.3 | 0.0905 | 0.721 | 1.09 | 0.25 | 0.5165 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0 | 0 | 3.7925 | 1 | 1500 | 1500 | 100 | 0.25 | 1500 | 1500 | 9.45 | 59.1 | 0.5 | 1.5 | 1.1 | 1500 | 1500 | 0.04375 | 0.0172244094488189 | 0 | 0 | 1500 | 9.5e-07 | 0.002125 | 6.85e-06 | 0.002 | 0.0002925 | 0.00026 | 0.003325 | 0.00023 | 5e-05 | 4e-07 | 3.3e-06 | 0.000685 | 0.00082 | 0.067815 | 0.065 | 6e-04 | 0.045 | 0.19375 | 0.0012 | 0.067 | 9e-04 | 6.025e-05 | 6.5e-06 | 4.525e-06 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 11.25 | 13.5 | 47 | 1500 | 97.425 | 1 | 0 |
| Median | 750.5 | 149514 | character | character | 1424746734.5 | character | 1436867403 | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | logical | character | character | character | character | character | character | character | character | logical | character | character | character | character | logical | 1 | character | character | character | 0 | logical | character | 0 | logical | character | logical | character | character | character | character | character | character | character | character | character | character | 966.5 | 237 | 6 | 1.7 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 9.5 | 3.9 | 3 | 0.101 | 0.721 | 1.09 | 0.25 | 0.533 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0 | 13.5 | 4.05 | logical | logical | 100 | 0.5 | logical | logical | 39.5 | 67 | 1.75 | 6 | 1.1 | logical | logical | 0.44979 | 0.177082677165355 | 5.5 | 7e-05 | logical | 3e-06 | 0.0044 | 8.4e-06 | 0.019 | 0.00045 | 0.00093 | 0.0069 | 8e-04 | 7.3e-05 | 2e-06 | 4.7e-06 | 0.00195 | 0.00082 | 0.135 | 0.194 | 9e-04 | 0.12 | 0.3185 | 0.0042 | 0.104 | 0.00167 | 8.45e-05 | 6.5e-06 | 6.35e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 42 | 15 | 60 | logical | 182.85 | 7 | 6 |
| Mean | 750.5 | 149612.94 | 1500 | 1500 | 1413694024.41 | 1500 | 1430317795.218 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1.84584178498986 | 1500 | 1500 | 1500 | 0.0486815415821501 | 1500 | 1500 | 0.137931034482759 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1083.235895 | 668.407142857143 | 13.3945006313131 | 4.87399004267425 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 19.7731428571429 | 9.98555555555556 | 3.72588888888889 | 0.173666666666667 | 0.721 | 1.09 | 0.25 | 0.533 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0.0105263157894737 | 0.0264565217391304 | 27.9578118686869 | 12.6564831460674 | 1500 | 1500 | 100 | 2.93333333333333 | 1500 | 1500 | 30.7285714285714 | 56.0555555555556 | 2.82298913043478 | 7.56324050632911 | 1.1 | 1500 | 1500 | 1.12053058111111 | 0.440933823928259 | 10.0671641791045 | 0.000303926086956522 | 1500 | 1.29393333333333e-05 | 0.00689818181818182 | 8.4e-06 | 0.024971487804878 | 0.000605 | 0.00111858823529412 | 0.008555625 | 0.0112242105263158 | 0.000110858823529412 | 1.42272727272727e-06 | 4.7e-06 | 0.00267827857142857 | 0.00082 | 0.16921 | 0.328764615384615 | 0.0144 | 0.203958235294118 | 0.377666666666667 | 0.00454708108108108 | 0.106559523809524 | 0.00158142857142857 | 8.45e-05 | 6.5e-06 | 6.35e-06 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 36.885 | 15.6666666666667 | 57 | 1500 | 131.183333333333 | 7.94074074074074 | 7.63111111111111 |
| 3rd Qu. | 1125.25 | 174505.75 | character | character | 1436494439 | character | 1445896711.75 | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | logical | character | character | character | character | character | character | character | character | logical | character | character | character | character | logical | 3 | character | character | character | 0 | logical | character | 0 | logical | character | logical | character | character | character | character | character | character | character | character | character | character | 1641.5 | 974 | 20 | 6.5 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 29 | 12.7 | 3.2 | 0.2205 | 0.721 | 1.09 | 0.25 | 0.5495 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0.002625 | 55 | 14.7 | logical | logical | 100 | 4.4 | logical | logical | 42.85 | 69.8 | 3.5 | 10.675 | 1.1 | logical | logical | 1.1938 | 0.47 | 13 | 0.0005975 | logical | 5.5e-06 | 0.0097 | 9.95e-06 | 0.03 | 0.0009625 | 0.00127 | 0.01405 | 0.001235 | 0.00017 | 2.245e-06 | 6.1e-06 | 0.005075 | 0.00082 | 0.2535 | 0.367 | 0.02145 | 0.1985 | 0.434 | 0.00771 | 0.13 | 0.00225 | 0.00010875 | 6.5e-06 | 8.175e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 52.25 | 17.5 | 70 | logical | 190.775 | 15 | 16 |
| Max. | 1500 | 199880 | character | character | 1452552527 | character | 1452553072 | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | character | 1500 | character | character | character | character | character | character | character | character | 1500 | character | character | character | character | 1500 | 17 | character | character | character | 1 | 1500 | character | 4 | 1500 | character | 1500 | character | character | character | character | character | character | character | character | character | character | 3700 | 2900 | 100 | 57 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 75 | 46.2 | 12.4 | 0.34 | 0.721 | 1.09 | 0.25 | 0.566 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0.1 | 0.43 | 100 | 100 | 1500 | 1500 | 100 | 8.3 | 1500 | 1500 | 71 | 70 | 46.7 | 61 | 1.1 | 1500 | 1500 | 102 | 40 | 50 | 0.001346 | 1500 | 1e-04 | 0.032 | 1.15e-05 | 0.217 | 0.0013 | 0.0066 | 0.016 | 0.2 | 0.000237 | 2.5e-06 | 7.5e-06 | 0.006 | 0.00082 | 0.372 | 1.43 | 0.042 | 1 | 1.155 | 0.0137 | 0.333 | 0.0026 | 0.000133 | 6.5e-06 | 1e-05 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 80 | 20 | 81 | 1500 | 198.7 | 28 | 28 |
| NA’s | 1 | 100030 | 1500 | 1500 | 1332073018 | 1500 | 1340209117 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | logical | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | logical | 1500 | 1500 | 1500 | 1500 | logical | 514 | 1500 | 1500 | 1500 | 514 | logical | 1500 | 514 | logical | 1500 | logical | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 700 | 1486 | 708 | 797 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 1465 | 1464 | 1491 | 1497 | 1499 | 1499 | 1499 | 1498 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 1481 | 1477 | 708 | 788 | logical | logical | 1499 | 1497 | logical | logical | 1493 | 1491 | 994 | 710 | 1499 | logical | logical | 780 | 780 | 1433 | 1477 | logical | 1485 | 1478 | 1498 | 1459 | 1478 | 1483 | 1484 | 1481 | 1483 | 1489 | 1498 | 1486 | 1499 | 1497 | 1487 | 1497 | 1449 | 1488 | 1463 | 1479 | 1493 | 1498 | 1499 | 1498 | 1499 | logical | logical | 1499 | logical | logical | logical | 1470 | 1497 | 1491 | logical | 1497 | 825 | 825 |
# View head of food
food %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
scroll_box(width = "100%", height = "300px")| V1 | code | url | creator | created_t | created_datetime | last_modified_t | last_modified_datetime | product_name | generic_name | quantity | packaging | packaging_tags | brands | brands_tags | categories | categories_tags | categories_en | origins | origins_tags | manufacturing_places | manufacturing_places_tags | labels | labels_tags | labels_en | emb_codes | emb_codes_tags | first_packaging_code_geo | cities | cities_tags | purchase_places | stores | countries | countries_tags | countries_en | ingredients_text | allergens | allergens_en | traces | traces_tags | traces_en | serving_size | no_nutriments | additives_n | additives | additives_tags | additives_en | ingredients_from_palm_oil_n | ingredients_from_palm_oil | ingredients_from_palm_oil_tags | ingredients_that_may_be_from_palm_oil_n | ingredients_that_may_be_from_palm_oil | ingredients_that_may_be_from_palm_oil_tags | nutrition_grade_uk | nutrition_grade_fr | pnns_groups_1 | pnns_groups_2 | states | states_tags | states_en | main_category | main_category_en | image_url | image_small_url | energy_100g | energy_from_fat_100g | fat_100g | saturated_fat_100g | butyric_acid_100g | caproic_acid_100g | caprylic_acid_100g | capric_acid_100g | lauric_acid_100g | myristic_acid_100g | palmitic_acid_100g | stearic_acid_100g | arachidic_acid_100g | behenic_acid_100g | lignoceric_acid_100g | cerotic_acid_100g | montanic_acid_100g | melissic_acid_100g | monounsaturated_fat_100g | polyunsaturated_fat_100g | omega_3_fat_100g | alpha_linolenic_acid_100g | eicosapentaenoic_acid_100g | docosahexaenoic_acid_100g | omega_6_fat_100g | linoleic_acid_100g | arachidonic_acid_100g | gamma_linolenic_acid_100g | dihomo_gamma_linolenic_acid_100g | omega_9_fat_100g | oleic_acid_100g | elaidic_acid_100g | gondoic_acid_100g | mead_acid_100g | erucic_acid_100g | nervonic_acid_100g | trans_fat_100g | cholesterol_100g | carbohydrates_100g | sugars_100g | sucrose_100g | glucose_100g | fructose_100g | lactose_100g | maltose_100g | maltodextrins_100g | starch_100g | polyols_100g | fiber_100g | proteins_100g | casein_100g | serum_proteins_100g | nucleotides_100g | salt_100g | sodium_100g | alcohol_100g | vitamin_a_100g | beta_carotene_100g | vitamin_d_100g | vitamin_e_100g | vitamin_k_100g | vitamin_c_100g | vitamin_b1_100g | vitamin_b2_100g | vitamin_pp_100g | vitamin_b6_100g | vitamin_b9_100g | vitamin_b12_100g | biotin_100g | pantothenic_acid_100g | silica_100g | bicarbonate_100g | potassium_100g | chloride_100g | calcium_100g | phosphorus_100g | iron_100g | magnesium_100g | zinc_100g | copper_100g | manganese_100g | fluoride_100g | selenium_100g | chromium_100g | molybdenum_100g | iodine_100g | caffeine_100g | taurine_100g | ph_100g | fruits_vegetables_nuts_100g | collagen_meat_protein_ratio_100g | cocoa_100g | chlorophyl_100g | carbon_footprint_100g | nutrition_score_fr_100g | nutrition_score_uk_100g |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 100030 | http://world-en.openfoodfacts.org/product/3222475745867/confiture-de-fraise-fraise-des-bois-au-sucre-de-canne-casino-delices | sebleouf | 1424747544 | 2015-02-24T03:12:24Z | 1438445887 | 2015-08-01T16:18:07Z | Confiture de fraise fraise des bois au sucre de canne | 265 g | Bocal,Verre | bocal,verre | Casino Délices | casino-delices | Aliments et boissons à base de végétaux,Aliments d’origine végétale,Aliments à base de fruits et de légumes,Petit-déjeuners,Produits à tartiner,Fruits et produits dérivés,Pâtes à tartiner végétaux,Produits à tartiner sucrés,Confitures et marmelades,Confitures,Confitures de fruits,Confitures de fruits rouges,Confitures de fraises | en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:breakfasts,en:spreads,en:fruits-based-foods,en:plant-based-spreads,en:sweet-spreads,en:fruit-preserves,en:jams,en:fruit-jams,en:berry-jams,en:strawberry-jams | Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Breakfasts,Spreads,Fruits based foods,Plant-based spreads,Sweet spreads,Fruit preserves,Jams,Fruit jams,Berry jams,Strawberry jams | France | france | EMB 78015 | emb-78015 | 48.983333,2.066667 | NA | andresy-yvelines-france | Lyon,France | Casino | France | en:france | France | Sucre de canne, fraises 40 g, fraises des bois 14 g, gélifiant : pectines de fruits, jus de citron concentré. Préparée avec 54 g de fruits pour 100 g de produit fini. | NA | Lait,Fruits à coque | en:milk,en:nuts | Milk,Nuts | 15 g | NA | 1 | [ sucre-de-canne -> fr:sucre-de-canne ] [ sucre-de -> fr:sucre-de ] [ sucre -> fr:sucre ] [ fraises-40-g -> fr:fraises-40-g ] [ fraises-40 -> fr:fraises-40 ] [ fraises -> fr:fraises ] [ fraises-des-bois-14-g -> fr:fraises-des-bois-14-g ] [ fraises-des-bois-14 -> fr:fraises-des-bois-14 ] [ fraises-des-bois -> fr:fraises-des-bois ] [ fraises-des -> fr:fraises-des ] [ fraises -> fr:fraises ] [ pectines-de-fruits -> fr:pectines-de-fruits ] [ pectines-de -> fr:pectines-de ] [ pectines -> en:e440 -> exists ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit-fini -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit-fini ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de-produit ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g-de ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100-g ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100 -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour-100 ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits-pour ] [ jus-de-citron-concentre-preparee-avec-54-g-de-fruits -> fr:jus-de-citron-concentre-preparee-avec-54-g-de-fruits ] [ jus-de-citron-concentre-preparee-avec-54-g-de -> fr:jus-de-citron-concentre-preparee-avec-54-g-de ] [ jus-de-citron-concentre-preparee-avec-54-g -> fr:jus-de-citron-concentre-preparee-avec-54-g ] [ jus-de-citron-concentre-preparee-avec-54 -> fr:jus-de-citron-concentre-preparee-avec-54 ] [ jus-de-citron-concentre-preparee-avec -> fr:jus-de-citron-concentre-preparee-avec ] [ jus-de-citron-concentre-preparee -> fr:jus-de-citron-concentre-preparee ] [ jus-de-citron-concentre -> fr:jus-de-citron-concentre ] [ jus-de-citron -> fr:jus-de-citron ] [ jus-de -> fr:jus-de ] [ jus -> fr:jus ] | en:e440 | E440 - Pectins | 0 | NA | 0 | NA | NA | d | Sugary snacks | Sweets | en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded | en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded | To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded | en:plant-based-foods-and-beverages | Plant-based foods and beverages | http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.400.jpg | http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.200.jpg | 918 | NA | 0.0 | 0.0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 54.0 | 54.0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.0 | NA | NA | NA | 0.0000 | 0.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 54 | NA | NA | NA | NA | 11 | 11 | |||||||||
| 2 | 100050 | http://world-en.openfoodfacts.org/product/5410976880110/guylian-sea-shells-selection | foodorigins | 1450316429 | 2015-12-17T01:40:29Z | 1450817956 | 2015-12-22T20:59:16Z | Guylian Sea Shells Selection | 375g | Plastic,Box | plastic,box | Guylian | guylian | Chocolate | en:sugary-snacks,en:chocolates | Sugary snacks,Chocolates | Belgium | belgium | NA | NSW,Australia | Australia | en:australia | Australia | NA | NA | NA | NA | NA | NA | NA | NA | Sugary snacks | Chocolate products | en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded | en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded | To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded | en:sugary-snacks | Sugary snacks | http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.400.jpg | http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.200.jpg | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |||||||||||||||||||||||
| 3 | 100079 | http://world-en.openfoodfacts.org/product/3264750423503/pates-de-fruits-aromatisees-jacquot | domdom26 | 1428674916 | 2015-04-10T14:08:36Z | 1428739289 | 2015-04-11T08:01:29Z | Pâtes de fruits aromatisées | Pâtes de fruits | 1 kg | Carton,plastique | carton,plastique | Jacquot | jacquot | pâtes de fruits | en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:sugary-snacks,en:confectioneries,en:fruits-based-foods,en:fruit-pastes | Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Sugary snacks,Confectioneries,Fruits based foods,Fruit pastes | NA | France | France | en:france | France | Pulpe de pommes 50% , sucre, sirop de glucose, gélifiant : pectine, acidifiant : acide citrique, arômes, colorants naturels : extrait de paprika â complexes cuivreâchlorophyllines â curcumine â antnocyanes | NA | NA | 2 | [ pulpe-de-pommes-50 -> fr:pulpe-de-pommes-50 ] [ pulpe-de-pommes -> fr:pulpe-de-pommes ] [ pulpe-de -> fr:pulpe-de ] [ pulpe -> fr:pulpe ] [ sucre -> fr:sucre ] [ sirop-de-glucose -> fr:sirop-de-glucose ] [ sirop-de -> fr:sirop-de ] [ sirop -> fr:sirop ] [ pectine -> en:e440 -> exists ] [ acide-citrique -> en:e330 -> exists ] [ aromes -> fr:aromes ] [ naturels -> fr:naturels ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine-antnocyanes -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine-antnocyanes ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines-curcumine ] [ extrait-de-paprika-complexes-cuivre-chlorophyllines -> fr:extrait-de-paprika-complexes-cuivre-chlorophyllines ] [ extrait-de-paprika-complexes-cuivre -> fr:extrait-de-paprika-complexes-cuivre ] [ extrait-de-paprika-complexes -> fr:extrait-de-paprika-complexes ] [ extrait-de-paprika -> fr:extrait-de-paprika ] [ extrait-de -> fr:extrait-de ] [ extrait -> fr:extrait ] | en:e440,en:e330 | E440 - Pectins,E330 - Citric acid | 0 | NA | 0 | NA | NA | Fruits and vegetables | Fruits | en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded | en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded | To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characteristics completed,Photos validated,Photos uploaded | en:plant-based-foods-and-beverages | Plant-based foods and beverages | http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.400.jpg | http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.200.jpg | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | ||||||||||||||||||||
| 4 | 100094 | http://world-en.openfoodfacts.org/product/8006040247001/nata-vegetal-a-base-de-soja-valsoia | javichu | 1420416591 | 2015-01-05T00:09:51Z | 1420417876 | 2015-01-05T00:31:16Z | Nata vegetal a base de soja "Valsoia" | Nata vegetal a base de soja | 200 ml | Tetra Brik | tetra-brik | Valsoia,//Propiedad de://,Valsoia S.p.A. | valsoia,propiedad-de,valsoia-s-p-a | Alimentos y bebidas de origen vegetal,Alimentos de origen vegetal,Natas vegetales,Natas vegetales a base de soja para cocinar,Natas vegetales para cocinar | en:plant-based-foods-and-beverages,en:plant-based-foods,en:plant-based-creams,en:plant-based-creams-for-cooking,en:soy-based-creams-for-cooking | Plant-based foods and beverages,Plant-based foods,Plant-based creams,Plant-based creams for cooking,Soy-based creams for cooking | Italia | italia | Vegetariano,Vegano,Sin gluten,Sin OMG,Sin lactosa | en:vegetarian,en:vegan,en:gluten-free,en:no-gmos,en:no-lactose | Vegetarian,Vegan,Gluten-free,No GMOs,No lactose | NA | Madrid,España | El Corte Inglés | España | en:spain | Spain | Extracto de soja (78%) (agua, semillas de soja 8,3%), grasas vegetales, jarabe de glucosa, dextrosa, emulsionante: mono- y diglicéridos de ácidos grasos (E-471), sal marina, estabilizantes: goma xantana (E-415), carragenatos (E-407), goma guar (E-412); aromas, antioxidante: extractos de tocoferoles (de soja) (E-306). (Nota: el envase en italiano del paquete -que puede verse en el enlace-, especifica que el producto es 100% vegetal. Por tanto los mono- y diglicéridos de ácidos grasos (E-471) son de origen no animal). | NA | NA | 5 | [ extracto-de-soja -> es:extracto-de-soja ] [ 78 -> es:78 ] [ agua -> es:agua ] [ semillas-de-soja-8 -> es:semillas-de-soja-8 ] [ 3 -> en:fd-c ] [ grasas-vegetales -> es:grasas-vegetales ] [ jarabe-de-glucosa -> es:jarabe-de-glucosa ] [ dextrosa -> es:dextrosa ] [ emulsionante -> es:emulsionante ] [ mono-y-digliceridos-de-acidos-grasos -> en:e471 -> exists ] [ e471 -> en:e471 ] [ sal-marina -> es:sal-marina ] [ estabilizantes -> es:estabilizantes ] [ goma-xantana -> en:e415 -> exists ] [ e415 -> en:e415 ] [ carragenatos -> en:e407 -> exists ] [ e407 -> en:e407 ] [ goma-guar -> en:e412 -> exists ] [ e412 -> en:e412 ] [ aromas -> es:aromas ] [ antioxidante -> es:antioxidante ] [ extractos-de-tocoferoles -> es:extractos-de-tocoferoles ] [ de-soja -> es:de-soja ] [ e306 -> en:e306 -> exists ] [ nota -> es:nota ] [ el-envase-en-italiano-del-paquete-que-puede-verse-en-el-enlace -> es:el-envase-en-italiano-del-paquete-que-puede-verse-en-el-enlace ] [ especifica-que-el-producto-es-100-vegetal-por-tanto-los-mono-y-digliceridos-de-acidos-grasos -> es:especifica-que-el-producto-es-100-vegetal-por-tanto-los-mono-y-digliceridos-de-acidos-grasos ] [ e471 -> en:e471 ] [ son-de-origen-no-animal -> es:son-de-origen-no-animal ] [ -> es: ] | en:e471,en:e415,en:e407,en:e412,en:e306 | E471 - Mono- and diglycerides of fatty acids,E415 - Xanthan gum,E407 - Carrageenan,E412 - Guar gum,E306 - Tocopherol-rich extract | 0 | NA | 1 | NA | e471-mono-et-diglycerides-d-acides-gras-alimentaires | NA | d | unknown | unknown | en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded | en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded | To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristics completed,Photos validated,Photos uploaded | en:plant-based-foods-and-beverages | Plant-based foods and beverages | http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.400.jpg | http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.200.jpg | 766 | NA | 16.7 | 9.9 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.9 | 3.9 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2e-04 | 5.7 | 4.2 | NA | NA | NA | NA | NA | NA | NA | NA | 0.2 | 2.9 | NA | NA | NA | 0.0508 | 0.02 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 11 | 11 | ||||||||||||
| 5 | 100124 | http://world-en.openfoodfacts.org/product/8480000340764/semillas-de-girasol-con-cascara-tostadas-aguasal-hacendado | javichu | 1420501121 | 2015-01-05T23:38:41Z | 1445700917 | 2015-10-24T15:35:17Z | Semillas de girasol con cáscara tostadas aguasal | Semillas de girasol con cáscara tostadas aguasal | 200 g | Bolsa de plástico,Envasado en atmósfera protectora | bolsa-de-plastico,envasado-en-atmosfera-protectora | Hacendado,//Propiedad de://,Mercadona S.A. | hacendado,propiedad-de,mercadona-s-a | Semillas de girasol y derivados, Semillas, Semillas de girasol, Semillas de girasol con cáscara, Semillas de girasol tostadas, Semillas de girasol con cáscara tostadas, Semillas de girasol con cáscara tostadas aguasal | en:plant-based-foods-and-beverages,en:plant-based-foods,en:seeds,en:sunflower-seeds-and-their-products,en:sunflower-seeds,en:roasted-sunflower-seeds,en:unshelled-sunflower-seeds,en:roasted-unshelled-sunflower-seeds,es:semillas-de-girasol-con-cascara-tostadas-aguasal | Plant-based foods and beverages,Plant-based foods,Seeds,Sunflower seeds and their products,Sunflower seeds,Roasted sunflower seeds,Unshelled sunflower seeds,Roasted unshelled sunflower seeds,es:Semillas-de-girasol-con-cascara-tostadas-aguasal | Argentina | argentina | Beniparrell,Valencia (provincia),Comunidad Valenciana,España | beniparrell,valencia-provincia,comunidad-valenciana,espana | Vegetariano,Vegano,Sin gluten | en:vegetarian,en:vegan,en:gluten-free | Vegetarian,Vegan,Gluten-free | ES 21.016540/V EC,ENVASADOR:,IMPORTACO S.A. | es-21-016540-v-ec,envasador,importaco-s-a | NA | Madrid,España | Mercadona | España | en:spain | Spain | Pipas de girasol y sal. | NA | Frutos de cáscara,Cacahuetes | en:nuts,en:peanuts | Nuts,Peanuts | NA | 0 | [ pipas-de-girasol-y-sal -> es:pipas-de-girasol-y-sal ] | 0 | NA | 0 | NA | NA | d | unknown | unknown | en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:characteristics-completed, en:photos-validated, en:photos-uploaded | en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed,en:characteristics-completed,en:photos-validated,en:photos-uploaded | To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristics completed,Photos validated,Photos uploaded | en:plant-based-foods-and-beverages | Plant-based foods and beverages | http://en.openfoodfacts.org/images/products/848/000/034/0764/front.6.400.jpg | http://en.openfoodfacts.org/images/products/848/000/034/0764/front.6.200.jpg | 2359 | NA | 45.5 | 5.2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 9.5 | 32.8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 17.3 | 2.7 | NA | NA | NA | NA | NA | NA | NA | NA | 9.0 | 18.2 | NA | NA | NA | 3.9878 | 1.57 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1.155 | 0.0038 | 0.129 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 17 | 17 | ||||||||
| 6 | 100136 | http://world-en.openfoodfacts.org/product/0087703177727/soft-drink | foodorigins | 1437983923 | 2015-07-27T07:58:43Z | 1445577476 | 2015-10-23T05:17:56Z | Soft Drink | South Korea | south-korea | South Korea | south-korea | NA | Australia | en:australia | Australia | NA | NA | NA | NA | NA | NA | NA | NA | unknown | unknown | en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-be-completed, en:characteristics-to-be-completed, en:categories-to-be-completed, en:brands-to-be-completed, en:packaging-to-be-completed, en:quantity-to-be-completed, en:photos-to-be-validated, en:photos-uploaded | en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-completed,en:characteristics-to-be-completed,en:categories-to-be-completed,en:brands-to-be-completed,en:packaging-to-be-completed,en:quantity-to-be-completed,en:photos-to-be-validated,en:photos-uploaded | To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Characteristics to be completed,Categories to be completed,Brands to be completed,Packaging to be completed,Quantity to be completed,Photos to be validated,Photos uploaded | http://en.openfoodfacts.org/images/products/008/770/317/7727/front.8.400.jpg | http://en.openfoodfacts.org/images/products/008/770/317/7727/front.8.200.jpg | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
# View structure of food
str(food, give.attr = FALSE)## 'data.frame': 1500 obs. of 160 variables:
## $ V1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ code : int 100030 100050 100079 100094 100124 100136 100194 100221 100257 100258 ...
## $ url : chr "http://world-en.openfoodfacts.org/product/3222475745867/confiture-de-fraise-fraise-des-bois-au-sucre-de-canne-casino-delices" "http://world-en.openfoodfacts.org/product/5410976880110/guylian-sea-shells-selection" "http://world-en.openfoodfacts.org/product/3264750423503/pates-de-fruits-aromatisees-jacquot" "http://world-en.openfoodfacts.org/product/8006040247001/nata-vegetal-a-base-de-soja-valsoia" ...
## $ creator : chr "sebleouf" "foodorigins" "domdom26" "javichu" ...
## $ created_t : int 1424747544 1450316429 1428674916 1420416591 1420501121 1437983923 1442420988 1435686217 1436991777 1400516512 ...
## $ created_datetime : chr "2015-02-24T03:12:24Z" "2015-12-17T01:40:29Z" "2015-04-10T14:08:36Z" "2015-01-05T00:09:51Z" ...
## $ last_modified_t : int 1438445887 1450817956 1428739289 1420417876 1445700917 1445577476 1442420988 1451405288 1436991779 1437236856 ...
## $ last_modified_datetime : chr "2015-08-01T16:18:07Z" "2015-12-22T20:59:16Z" "2015-04-11T08:01:29Z" "2015-01-05T00:31:16Z" ...
## $ product_name : chr "Confiture de fraise fraise des bois au sucre de canne" "Guylian Sea Shells Selection" "Pâtes de fruits aromatisées" "Nata vegetal a base de soja "Valsoia"" ...
## $ generic_name : chr "" "" "Pâtes de fruits" "Nata vegetal a base de soja" ...
## $ quantity : chr "265 g" "375g" "1 kg" "200 ml" ...
## $ packaging : chr "Bocal,Verre" "Plastic,Box" "Carton,plastique" "Tetra Brik" ...
## $ packaging_tags : chr "bocal,verre" "plastic,box" "carton,plastique" "tetra-brik" ...
## $ brands : chr "Casino Délices" "Guylian" "Jacquot" "Valsoia,//Propiedad de://,Valsoia S.p.A." ...
## $ brands_tags : chr "casino-delices" "guylian" "jacquot" "valsoia,propiedad-de,valsoia-s-p-a" ...
## $ categories : chr "Aliments et boissons à base de végétaux,Aliments d'origine végétale,Aliments à base de fruits et de légu"| __truncated__ "Chocolate" "pâtes de fruits" "Alimentos y bebidas de origen vegetal,Alimentos de origen vegetal,Natas vegetales,Natas vegetales a base de soj"| __truncated__ ...
## $ categories_tags : chr "en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:breakfasts,en:s"| __truncated__ "en:sugary-snacks,en:chocolates" "en:plant-based-foods-and-beverages,en:plant-based-foods,en:fruits-and-vegetables-based-foods,en:sugary-snacks,e"| __truncated__ "en:plant-based-foods-and-beverages,en:plant-based-foods,en:plant-based-creams,en:plant-based-creams-for-cooking"| __truncated__ ...
## $ categories_en : chr "Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Breakfasts,Spreads,Fruits b"| __truncated__ "Sugary snacks,Chocolates" "Plant-based foods and beverages,Plant-based foods,Fruits and vegetables based foods,Sugary snacks,Confectioneri"| __truncated__ "Plant-based foods and beverages,Plant-based foods,Plant-based creams,Plant-based creams for cooking,Soy-based c"| __truncated__ ...
## $ origins : chr "" "" "" "" ...
## $ origins_tags : chr "" "" "" "" ...
## $ manufacturing_places : chr "France" "Belgium" "" "Italia" ...
## $ manufacturing_places_tags : chr "france" "belgium" "" "italia" ...
## $ labels : chr "" "" "" "Vegetariano,Vegano,Sin gluten,Sin OMG,Sin lactosa" ...
## $ labels_tags : chr "" "" "" "en:vegetarian,en:vegan,en:gluten-free,en:no-gmos,en:no-lactose" ...
## $ labels_en : chr "" "" "" "Vegetarian,Vegan,Gluten-free,No GMOs,No lactose" ...
## $ emb_codes : chr "EMB 78015" "" "" "" ...
## $ emb_codes_tags : chr "emb-78015" "" "" "" ...
## $ first_packaging_code_geo : chr "48.983333,2.066667" "" "" "" ...
## $ cities : logi NA NA NA NA NA NA ...
## $ cities_tags : chr "andresy-yvelines-france" "" "" "" ...
## $ purchase_places : chr "Lyon,France" "NSW,Australia" "France" "Madrid,España" ...
## $ stores : chr "Casino" "" "" "El Corte Inglés" ...
## $ countries : chr "France" "Australia" "France" "España" ...
## $ countries_tags : chr "en:france" "en:australia" "en:france" "en:spain" ...
## $ countries_en : chr "France" "Australia" "France" "Spain" ...
## $ ingredients_text : chr "Sucre de canne, fraises 40 g, fraises des bois 14 g, gélifiant : pectines de fruits, jus de citron concentré."| __truncated__ "" "Pulpe de pommes 50% , sucre, sirop de glucose, gélifiant : pectine, acidifiant : acide citrique, arômes, colo"| __truncated__ "Extracto de soja (78%) (agua, semillas de soja 8,3%), grasas vegetales, jarabe de glucosa, dextrosa, emulsionan"| __truncated__ ...
## $ allergens : chr "" "" "" "" ...
## $ allergens_en : logi NA NA NA NA NA NA ...
## $ traces : chr "Lait,Fruits à coque" "" "" "" ...
## $ traces_tags : chr "en:milk,en:nuts" "" "" "" ...
## $ traces_en : chr "Milk,Nuts" "" "" "" ...
## $ serving_size : chr "15 g" "" "" "" ...
## $ no_nutriments : logi NA NA NA NA NA NA ...
## $ additives_n : int 1 NA 2 5 0 NA NA 0 NA 1 ...
## $ additives : chr "[ sucre-de-canne -> fr:sucre-de-canne ] [ sucre-de -> fr:sucre-de ] [ sucre -> fr:sucre ] [ fraises-40-g "| __truncated__ "" "[ pulpe-de-pommes-50 -> fr:pulpe-de-pommes-50 ] [ pulpe-de-pommes -> fr:pulpe-de-pommes ] [ pulpe-de -> fr:"| __truncated__ "[ extracto-de-soja -> es:extracto-de-soja ] [ 78 -> es:78 ] [ agua -> es:agua ] [ semillas-de-soja-8 -> e"| __truncated__ ...
## $ additives_tags : chr "en:e440" "" "en:e440,en:e330" "en:e471,en:e415,en:e407,en:e412,en:e306" ...
## $ additives_en : chr "E440 - Pectins" "" "E440 - Pectins,E330 - Citric acid" "E471 - Mono- and diglycerides of fatty acids,E415 - Xanthan gum,E407 - Carrageenan,E412 - Guar gum,E306 - Tocop"| __truncated__ ...
## $ ingredients_from_palm_oil_n : int 0 NA 0 0 0 NA NA 0 NA 0 ...
## $ ingredients_from_palm_oil : logi NA NA NA NA NA NA ...
## $ ingredients_from_palm_oil_tags : chr "" "" "" "" ...
## $ ingredients_that_may_be_from_palm_oil_n : int 0 NA 0 1 0 NA NA 0 NA 0 ...
## $ ingredients_that_may_be_from_palm_oil : logi NA NA NA NA NA NA ...
## $ ingredients_that_may_be_from_palm_oil_tags: chr "" "" "" "e471-mono-et-diglycerides-d-acides-gras-alimentaires" ...
## $ nutrition_grade_uk : logi NA NA NA NA NA NA ...
## $ nutrition_grade_fr : chr "d" "" "" "d" ...
## $ pnns_groups_1 : chr "Sugary snacks" "Sugary snacks" "Fruits and vegetables" "unknown" ...
## $ pnns_groups_2 : chr "Sweets" "Chocolate products" "Fruits" "unknown" ...
## $ states : chr "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be"| __truncated__ "en:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-b"| __truncated__ "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be"| __truncated__ "en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-compl"| __truncated__ ...
## $ states_tags : chr "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-com"| __truncated__ "en:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-c"| __truncated__ "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-to-be-com"| __truncated__ "en:to-be-checked,en:complete,en:nutrition-facts-completed,en:ingredients-completed,en:expiration-date-completed"| __truncated__ ...
## $ states_en : chr "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characte"| __truncated__ "To be completed,Nutrition facts to be completed,Ingredients to be completed,Expiration date to be completed,Cha"| __truncated__ "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date to be completed,Characte"| __truncated__ "To be checked,Complete,Nutrition facts completed,Ingredients completed,Expiration date completed,Characteristic"| __truncated__ ...
## $ main_category : chr "en:plant-based-foods-and-beverages" "en:sugary-snacks" "en:plant-based-foods-and-beverages" "en:plant-based-foods-and-beverages" ...
## $ main_category_en : chr "Plant-based foods and beverages" "Sugary snacks" "Plant-based foods and beverages" "Plant-based foods and beverages" ...
## $ image_url : chr "http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.400.jpg" "http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.400.jpg" "http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.400.jpg" "http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.400.jpg" ...
## $ image_small_url : chr "http://en.openfoodfacts.org/images/products/322/247/574/5867/front.8.200.jpg" "http://en.openfoodfacts.org/images/products/541/097/688/0110/front.7.200.jpg" "http://en.openfoodfacts.org/images/products/326/475/042/3503/front.6.200.jpg" "http://en.openfoodfacts.org/images/products/800/604/024/7001/front.7.200.jpg" ...
## $ energy_100g : num 918 NA NA 766 2359 ...
## $ energy_from_fat_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ fat_100g : num 0 NA NA 16.7 45.5 NA NA 25 NA 4 ...
## $ saturated_fat_100g : num 0 NA NA 9.9 5.2 NA NA 17 NA 0.54 ...
## $ butyric_acid_100g : logi NA NA NA NA NA NA ...
## $ caproic_acid_100g : logi NA NA NA NA NA NA ...
## $ caprylic_acid_100g : logi NA NA NA NA NA NA ...
## $ capric_acid_100g : logi NA NA NA NA NA NA ...
## $ lauric_acid_100g : logi NA NA NA NA NA NA ...
## $ myristic_acid_100g : logi NA NA NA NA NA NA ...
## $ palmitic_acid_100g : logi NA NA NA NA NA NA ...
## $ stearic_acid_100g : logi NA NA NA NA NA NA ...
## $ arachidic_acid_100g : logi NA NA NA NA NA NA ...
## $ behenic_acid_100g : logi NA NA NA NA NA NA ...
## $ lignoceric_acid_100g : logi NA NA NA NA NA NA ...
## $ cerotic_acid_100g : logi NA NA NA NA NA NA ...
## $ montanic_acid_100g : logi NA NA NA NA NA NA ...
## $ melissic_acid_100g : logi NA NA NA NA NA NA ...
## $ monounsaturated_fat_100g : num NA NA NA 2.9 9.5 NA NA NA NA NA ...
## $ polyunsaturated_fat_100g : num NA NA NA 3.9 32.8 NA NA NA NA NA ...
## $ omega_3_fat_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ alpha_linolenic_acid_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ eicosapentaenoic_acid_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ docosahexaenoic_acid_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ omega_6_fat_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ linoleic_acid_100g : num NA NA NA NA NA NA NA NA NA NA ...
## $ arachidonic_acid_100g : logi NA NA NA NA NA NA ...
## $ gamma_linolenic_acid_100g : logi NA NA NA NA NA NA ...
## $ dihomo_gamma_linolenic_acid_100g : logi NA NA NA NA NA NA ...
## $ omega_9_fat_100g : logi NA NA NA NA NA NA ...
## $ oleic_acid_100g : logi NA NA NA NA NA NA ...
## $ elaidic_acid_100g : logi NA NA NA NA NA NA ...
## $ gondoic_acid_100g : logi NA NA NA NA NA NA ...
## $ mead_acid_100g : logi NA NA NA NA NA NA ...
## $ erucic_acid_100g : logi NA NA NA NA NA NA ...
## [list output truncated]
Information overload. With datasets this big, it’s hard to get a handle on exactly what they contain.
Inspecting variables
The str(), head(), and summary() functions are designed to give you some information about a dataset without being overwhelming. However, this dataset is so large and has so many variables that even these outputs seemed pretty intimidating!
The glimpse() function from the dplyr package often formats information in a more approachable way.
Yet another option is to just look at the column names to see what kinds of data you have. As you look at the names, pay particular attention to any pairs that look like duplicates.
# Load dplyr
library(dplyr)
# View a glimpse of food
glimpse(food)## Observations: 1,500
## Variables: 160
## $ V1 <int> 1, 2, 3, 4, 5, 6, 7...
## $ code <int> 100030, 100050, 100...
## $ url <chr> "http://world-en.op...
## $ creator <chr> "sebleouf", "foodor...
## $ created_t <int> 1424747544, 1450316...
## $ created_datetime <chr> "2015-02-24T03:12:2...
## $ last_modified_t <int> 1438445887, 1450817...
## $ last_modified_datetime <chr> "2015-08-01T16:18:0...
## $ product_name <chr> "Confiture de frais...
## $ generic_name <chr> "", "", "Pâtes de ...
## $ quantity <chr> "265 g", "375g", "1...
## $ packaging <chr> "Bocal,Verre", "Pla...
## $ packaging_tags <chr> "bocal,verre", "pla...
## $ brands <chr> "Casino Délices", ...
## $ brands_tags <chr> "casino-delices", "...
## $ categories <chr> "Aliments et boisso...
## $ categories_tags <chr> "en:plant-based-foo...
## $ categories_en <chr> "Plant-based foods ...
## $ origins <chr> "", "", "", "", "Ar...
## $ origins_tags <chr> "", "", "", "", "ar...
## $ manufacturing_places <chr> "France", "Belgium"...
## $ manufacturing_places_tags <chr> "france", "belgium"...
## $ labels <chr> "", "", "", "Vegeta...
## $ labels_tags <chr> "", "", "", "en:veg...
## $ labels_en <chr> "", "", "", "Vegeta...
## $ emb_codes <chr> "EMB 78015", "", ""...
## $ emb_codes_tags <chr> "emb-78015", "", ""...
## $ first_packaging_code_geo <chr> "48.983333,2.066667...
## $ cities <lgl> NA, NA, NA, NA, NA,...
## $ cities_tags <chr> "andresy-yvelines-f...
## $ purchase_places <chr> "Lyon,France", "NSW...
## $ stores <chr> "Casino", "", "", "...
## $ countries <chr> "France", "Australi...
## $ countries_tags <chr> "en:france", "en:au...
## $ countries_en <chr> "France", "Australi...
## $ ingredients_text <chr> "Sucre de canne, fr...
## $ allergens <chr> "", "", "", "", "",...
## $ allergens_en <lgl> NA, NA, NA, NA, NA,...
## $ traces <chr> "Lait,Fruits à coq...
## $ traces_tags <chr> "en:milk,en:nuts", ...
## $ traces_en <chr> "Milk,Nuts", "", ""...
## $ serving_size <chr> "15 g", "", "", "",...
## $ no_nutriments <lgl> NA, NA, NA, NA, NA,...
## $ additives_n <int> 1, NA, 2, 5, 0, NA,...
## $ additives <chr> "[ sucre-de-canne -...
## $ additives_tags <chr> "en:e440", "", "en:...
## $ additives_en <chr> "E440 - Pectins", "...
## $ ingredients_from_palm_oil_n <int> 0, NA, 0, 0, 0, NA,...
## $ ingredients_from_palm_oil <lgl> NA, NA, NA, NA, NA,...
## $ ingredients_from_palm_oil_tags <chr> "", "", "", "", "",...
## $ ingredients_that_may_be_from_palm_oil_n <int> 0, NA, 0, 1, 0, NA,...
## $ ingredients_that_may_be_from_palm_oil <lgl> NA, NA, NA, NA, NA,...
## $ ingredients_that_may_be_from_palm_oil_tags <chr> "", "", "", "e471-m...
## $ nutrition_grade_uk <lgl> NA, NA, NA, NA, NA,...
## $ nutrition_grade_fr <chr> "d", "", "", "d", "...
## $ pnns_groups_1 <chr> "Sugary snacks", "S...
## $ pnns_groups_2 <chr> "Sweets", "Chocolat...
## $ states <chr> "en:to-be-checked, ...
## $ states_tags <chr> "en:to-be-checked,e...
## $ states_en <chr> "To be checked,Comp...
## $ main_category <chr> "en:plant-based-foo...
## $ main_category_en <chr> "Plant-based foods ...
## $ image_url <chr> "http://en.openfood...
## $ image_small_url <chr> "http://en.openfood...
## $ energy_100g <dbl> 918, NA, NA, 766, 2...
## $ energy_from_fat_100g <dbl> NA, NA, NA, NA, NA,...
## $ fat_100g <dbl> 0.00, NA, NA, 16.70...
## $ saturated_fat_100g <dbl> 0.000, NA, NA, 9.90...
## $ butyric_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ caproic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ caprylic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ capric_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ lauric_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ myristic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ palmitic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ stearic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ arachidic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ behenic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ lignoceric_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ cerotic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ montanic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ melissic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ monounsaturated_fat_100g <dbl> NA, NA, NA, 2.9, 9....
## $ polyunsaturated_fat_100g <dbl> NA, NA, NA, 3.9, 32...
## $ omega_3_fat_100g <dbl> NA, NA, NA, NA, NA,...
## $ alpha_linolenic_acid_100g <dbl> NA, NA, NA, NA, NA,...
## $ eicosapentaenoic_acid_100g <dbl> NA, NA, NA, NA, NA,...
## $ docosahexaenoic_acid_100g <dbl> NA, NA, NA, NA, NA,...
## $ omega_6_fat_100g <dbl> NA, NA, NA, NA, NA,...
## $ linoleic_acid_100g <dbl> NA, NA, NA, NA, NA,...
## $ arachidonic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ gamma_linolenic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ dihomo_gamma_linolenic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ omega_9_fat_100g <lgl> NA, NA, NA, NA, NA,...
## $ oleic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ elaidic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ gondoic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ mead_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ erucic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ nervonic_acid_100g <lgl> NA, NA, NA, NA, NA,...
## $ trans_fat_100g <dbl> NA, NA, NA, NA, NA,...
## $ cholesterol_100g <dbl> NA, NA, NA, 0.00020...
## $ carbohydrates_100g <dbl> 54.00, NA, NA, 5.70...
## $ sugars_100g <dbl> 54.00, NA, NA, 4.20...
## $ sucrose_100g <lgl> NA, NA, NA, NA, NA,...
## $ glucose_100g <lgl> NA, NA, NA, NA, NA,...
## $ fructose_100g <int> NA, NA, NA, NA, NA,...
## $ lactose_100g <dbl> NA, NA, NA, NA, NA,...
## $ maltose_100g <lgl> NA, NA, NA, NA, NA,...
## $ maltodextrins_100g <lgl> NA, NA, NA, NA, NA,...
## $ starch_100g <dbl> NA, NA, NA, NA, NA,...
## $ polyols_100g <dbl> NA, NA, NA, NA, NA,...
## $ fiber_100g <dbl> NA, NA, NA, 0.2, 9....
## $ proteins_100g <dbl> 0.00, NA, NA, 2.90,...
## $ casein_100g <dbl> NA, NA, NA, NA, NA,...
## $ serum_proteins_100g <lgl> NA, NA, NA, NA, NA,...
## $ nucleotides_100g <lgl> NA, NA, NA, NA, NA,...
## $ salt_100g <dbl> 0.0000000, NA, NA, ...
## $ sodium_100g <dbl> 0.0000000, NA, NA, ...
## $ alcohol_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_a_100g <dbl> NA, NA, NA, NA, NA,...
## $ beta_carotene_100g <lgl> NA, NA, NA, NA, NA,...
## $ vitamin_d_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_e_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_k_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_c_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b1_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b2_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_pp_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b6_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b9_100g <dbl> NA, NA, NA, NA, NA,...
## $ vitamin_b12_100g <dbl> NA, NA, NA, NA, NA,...
## $ biotin_100g <dbl> NA, NA, NA, NA, NA,...
## $ pantothenic_acid_100g <dbl> NA, NA, NA, NA, NA,...
## $ silica_100g <dbl> NA, NA, NA, NA, NA,...
## $ bicarbonate_100g <dbl> NA, NA, NA, NA, NA,...
## $ potassium_100g <dbl> NA, NA, NA, NA, NA,...
## $ chloride_100g <dbl> NA, NA, NA, NA, NA,...
## $ calcium_100g <dbl> NA, NA, NA, NA, NA,...
## $ phosphorus_100g <dbl> NA, NA, NA, NA, 1.1...
## $ iron_100g <dbl> NA, NA, NA, NA, 0.0...
## $ magnesium_100g <dbl> NA, NA, NA, NA, 0.1...
## $ zinc_100g <dbl> NA, NA, NA, NA, NA,...
## $ copper_100g <dbl> NA, NA, NA, NA, NA,...
## $ manganese_100g <dbl> NA, NA, NA, NA, NA,...
## $ fluoride_100g <dbl> NA, NA, NA, NA, NA,...
## $ selenium_100g <dbl> NA, NA, NA, NA, NA,...
## $ chromium_100g <lgl> NA, NA, NA, NA, NA,...
## $ molybdenum_100g <lgl> NA, NA, NA, NA, NA,...
## $ iodine_100g <dbl> NA, NA, NA, NA, NA,...
## $ caffeine_100g <lgl> NA, NA, NA, NA, NA,...
## $ taurine_100g <lgl> NA, NA, NA, NA, NA,...
## $ ph_100g <lgl> NA, NA, NA, NA, NA,...
## $ fruits_vegetables_nuts_100g <dbl> 54, NA, NA, NA, NA,...
## $ collagen_meat_protein_ratio_100g <int> NA, NA, NA, NA, NA,...
## $ cocoa_100g <int> NA, NA, NA, NA, NA,...
## $ chlorophyl_100g <lgl> NA, NA, NA, NA, NA,...
## $ carbon_footprint_100g <dbl> NA, NA, NA, NA, NA,...
## $ nutrition_score_fr_100g <int> 11, NA, NA, 11, 17,...
## $ nutrition_score_uk_100g <int> 11, NA, NA, 11, 17,...
# View column names of food
names(food)## [1] "V1"
## [2] "code"
## [3] "url"
## [4] "creator"
## [5] "created_t"
## [6] "created_datetime"
## [7] "last_modified_t"
## [8] "last_modified_datetime"
## [9] "product_name"
## [10] "generic_name"
## [11] "quantity"
## [12] "packaging"
## [13] "packaging_tags"
## [14] "brands"
## [15] "brands_tags"
## [16] "categories"
## [17] "categories_tags"
## [18] "categories_en"
## [19] "origins"
## [20] "origins_tags"
## [21] "manufacturing_places"
## [22] "manufacturing_places_tags"
## [23] "labels"
## [24] "labels_tags"
## [25] "labels_en"
## [26] "emb_codes"
## [27] "emb_codes_tags"
## [28] "first_packaging_code_geo"
## [29] "cities"
## [30] "cities_tags"
## [31] "purchase_places"
## [32] "stores"
## [33] "countries"
## [34] "countries_tags"
## [35] "countries_en"
## [36] "ingredients_text"
## [37] "allergens"
## [38] "allergens_en"
## [39] "traces"
## [40] "traces_tags"
## [41] "traces_en"
## [42] "serving_size"
## [43] "no_nutriments"
## [44] "additives_n"
## [45] "additives"
## [46] "additives_tags"
## [47] "additives_en"
## [48] "ingredients_from_palm_oil_n"
## [49] "ingredients_from_palm_oil"
## [50] "ingredients_from_palm_oil_tags"
## [51] "ingredients_that_may_be_from_palm_oil_n"
## [52] "ingredients_that_may_be_from_palm_oil"
## [53] "ingredients_that_may_be_from_palm_oil_tags"
## [54] "nutrition_grade_uk"
## [55] "nutrition_grade_fr"
## [56] "pnns_groups_1"
## [57] "pnns_groups_2"
## [58] "states"
## [59] "states_tags"
## [60] "states_en"
## [61] "main_category"
## [62] "main_category_en"
## [63] "image_url"
## [64] "image_small_url"
## [65] "energy_100g"
## [66] "energy_from_fat_100g"
## [67] "fat_100g"
## [68] "saturated_fat_100g"
## [69] "butyric_acid_100g"
## [70] "caproic_acid_100g"
## [71] "caprylic_acid_100g"
## [72] "capric_acid_100g"
## [73] "lauric_acid_100g"
## [74] "myristic_acid_100g"
## [75] "palmitic_acid_100g"
## [76] "stearic_acid_100g"
## [77] "arachidic_acid_100g"
## [78] "behenic_acid_100g"
## [79] "lignoceric_acid_100g"
## [80] "cerotic_acid_100g"
## [81] "montanic_acid_100g"
## [82] "melissic_acid_100g"
## [83] "monounsaturated_fat_100g"
## [84] "polyunsaturated_fat_100g"
## [85] "omega_3_fat_100g"
## [86] "alpha_linolenic_acid_100g"
## [87] "eicosapentaenoic_acid_100g"
## [88] "docosahexaenoic_acid_100g"
## [89] "omega_6_fat_100g"
## [90] "linoleic_acid_100g"
## [91] "arachidonic_acid_100g"
## [92] "gamma_linolenic_acid_100g"
## [93] "dihomo_gamma_linolenic_acid_100g"
## [94] "omega_9_fat_100g"
## [95] "oleic_acid_100g"
## [96] "elaidic_acid_100g"
## [97] "gondoic_acid_100g"
## [98] "mead_acid_100g"
## [99] "erucic_acid_100g"
## [100] "nervonic_acid_100g"
## [101] "trans_fat_100g"
## [102] "cholesterol_100g"
## [103] "carbohydrates_100g"
## [104] "sugars_100g"
## [105] "sucrose_100g"
## [106] "glucose_100g"
## [107] "fructose_100g"
## [108] "lactose_100g"
## [109] "maltose_100g"
## [110] "maltodextrins_100g"
## [111] "starch_100g"
## [112] "polyols_100g"
## [113] "fiber_100g"
## [114] "proteins_100g"
## [115] "casein_100g"
## [116] "serum_proteins_100g"
## [117] "nucleotides_100g"
## [118] "salt_100g"
## [119] "sodium_100g"
## [120] "alcohol_100g"
## [121] "vitamin_a_100g"
## [122] "beta_carotene_100g"
## [123] "vitamin_d_100g"
## [124] "vitamin_e_100g"
## [125] "vitamin_k_100g"
## [126] "vitamin_c_100g"
## [127] "vitamin_b1_100g"
## [128] "vitamin_b2_100g"
## [129] "vitamin_pp_100g"
## [130] "vitamin_b6_100g"
## [131] "vitamin_b9_100g"
## [132] "vitamin_b12_100g"
## [133] "biotin_100g"
## [134] "pantothenic_acid_100g"
## [135] "silica_100g"
## [136] "bicarbonate_100g"
## [137] "potassium_100g"
## [138] "chloride_100g"
## [139] "calcium_100g"
## [140] "phosphorus_100g"
## [141] "iron_100g"
## [142] "magnesium_100g"
## [143] "zinc_100g"
## [144] "copper_100g"
## [145] "manganese_100g"
## [146] "fluoride_100g"
## [147] "selenium_100g"
## [148] "chromium_100g"
## [149] "molybdenum_100g"
## [150] "iodine_100g"
## [151] "caffeine_100g"
## [152] "taurine_100g"
## [153] "ph_100g"
## [154] "fruits_vegetables_nuts_100g"
## [155] "collagen_meat_protein_ratio_100g"
## [156] "cocoa_100g"
## [157] "chlorophyl_100g"
## [158] "carbon_footprint_100g"
## [159] "nutrition_score_fr_100g"
## [160] "nutrition_score_uk_100g"
This is a little more manageable. Before moving on, scroll through the column names and see if you can find pairs that might be duplicates.
Removing duplicate info
Wow! That’s a lot of variables. To summarize, there’s some information on what and when information was added (1:9), meta information about food (10:17, 22:27), where it came from (18:21, 28:34), what it’s made of (35:52), nutrition grades (53:54), some unclear (55:63), and some nutritional information (64:159).
There are also many different pairs of columns that contain duplicate information. Luckily, you have a trusty assistant who went through and identified duplicate columns for you.
A vector has been created for you that lists out all of the duplicates; all you need to do is remove those columns from the dataset. Don’t forget, you can use the - operator to specify columns to omit, e.g.:
my_df[, -3]# Omit third column
# Define vector of duplicate cols (don't change)
duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22,
24, 25, 28, 32, 34, 36, 38, 40,
44, 46, 48, 51, 54, 65, 158)
# Remove duplicates from food: food2
food2 <- food[,-duplicates]Removing useless info
Your dataset is much more manageable already.
In addition to duplicate columns, there are many columns containing information that you just can’t use. For example, the first few columns contain internal codes that don’t have any meaning to us. There are also some column names that aren’t clear enough to tell what they contain.
All of these columns can be deleted. Once again, your assistant did a splendid job finding the indices for you.
# Define useless vector (don't change)
useless <- c(1, 2, 3, 32:41)
# Remove useless columns from food2: food3
food3 <- food2[, -useless]Finding columns
Looking much nicer! Recall from the first exercise that you are assuming you will be analyzing the sugar content of these foods. Therefore, your next step is to look at a summary of the nutrition information.
All of the columns with nutrition info contain the character string “100g” as part of their name, which makes it easy to identify them.
library(stringr)
# Create vector of column indices: nutrition
nutrition <- str_detect(names(food3), "100g")
# View a summary of nutrition columns
sum_food3 <- as.data.frame(do.call(cbind, lapply(food3[,nutrition], summary)))## Warning in (function (..., deparse.level = 1) : number of rows of result is
## not a multiple of vector length (arg 4)
sum_food3 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| energy_from_fat_100g | fat_100g | saturated_fat_100g | butyric_acid_100g | caproic_acid_100g | caprylic_acid_100g | capric_acid_100g | lauric_acid_100g | myristic_acid_100g | palmitic_acid_100g | stearic_acid_100g | arachidic_acid_100g | behenic_acid_100g | lignoceric_acid_100g | cerotic_acid_100g | montanic_acid_100g | melissic_acid_100g | monounsaturated_fat_100g | polyunsaturated_fat_100g | omega_3_fat_100g | alpha_linolenic_acid_100g | eicosapentaenoic_acid_100g | docosahexaenoic_acid_100g | omega_6_fat_100g | linoleic_acid_100g | arachidonic_acid_100g | gamma_linolenic_acid_100g | dihomo_gamma_linolenic_acid_100g | omega_9_fat_100g | oleic_acid_100g | elaidic_acid_100g | gondoic_acid_100g | mead_acid_100g | erucic_acid_100g | nervonic_acid_100g | trans_fat_100g | cholesterol_100g | carbohydrates_100g | sugars_100g | sucrose_100g | glucose_100g | fructose_100g | lactose_100g | maltose_100g | maltodextrins_100g | starch_100g | polyols_100g | fiber_100g | proteins_100g | casein_100g | serum_proteins_100g | nucleotides_100g | salt_100g | sodium_100g | alcohol_100g | vitamin_a_100g | beta_carotene_100g | vitamin_d_100g | vitamin_e_100g | vitamin_k_100g | vitamin_c_100g | vitamin_b1_100g | vitamin_b2_100g | vitamin_pp_100g | vitamin_b6_100g | vitamin_b9_100g | vitamin_b12_100g | biotin_100g | pantothenic_acid_100g | silica_100g | bicarbonate_100g | potassium_100g | chloride_100g | calcium_100g | phosphorus_100g | iron_100g | magnesium_100g | zinc_100g | copper_100g | manganese_100g | fluoride_100g | selenium_100g | chromium_100g | molybdenum_100g | iodine_100g | caffeine_100g | taurine_100g | ph_100g | fruits_vegetables_nuts_100g | collagen_meat_protein_ratio_100g | cocoa_100g | chlorophyl_100g | nutrition_score_fr_100g | nutrition_score_uk_100g | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | 0 | 0 | 0 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0.4 | 0.033 | 0.08 | 0.721 | 1.09 | 0.25 | 0.5 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0 | 0 | 0 | logical | logical | 100 | 0 | logical | logical | 0 | 8.6 | 0 | 0 | 1.1 | logical | logical | 0 | 0 | 0 | 0 | logical | 7.5e-07 | 5e-04 | 5.3e-06 | 0 | 6e-05 | 0.000176 | 0.00059 | 6.6e-05 | 1.13e-05 | 2e-07 | 1.9e-06 | 9e-07 | 0.00082 | 0.00063 | 4e-05 | 3e-04 | 0 | 0.043 | 0 | 5e-05 | 5e-04 | 3.6e-05 | 6.5e-06 | 2.7e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 2 | 12 | 30 | logical | -12 | -12 |
| 1st Qu. | 35.975 | 0.9 | 0.2 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 3.87 | 1.6525 | 1.3 | 0.0905 | 0.721 | 1.09 | 0.25 | 0.5165 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0 | 0 | 3.7925 | 1 | 1500 | 1500 | 100 | 0.25 | 1500 | 1500 | 9.45 | 59.1 | 0.5 | 1.5 | 1.1 | 1500 | 1500 | 0.04375 | 0.0172244094488189 | 0 | 0 | 1500 | 9.5e-07 | 0.002125 | 6.85e-06 | 0.002 | 0.0002925 | 0.00026 | 0.003325 | 0.00023 | 5e-05 | 4e-07 | 3.3e-06 | 0.000685 | 0.00082 | 0.067815 | 0.065 | 6e-04 | 0.045 | 0.19375 | 0.0012 | 0.067 | 9e-04 | 6.025e-05 | 6.5e-06 | 4.525e-06 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 11.25 | 13.5 | 47 | 1500 | 1 | 0 |
| Median | 237 | 6 | 1.7 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 9.5 | 3.9 | 3 | 0.101 | 0.721 | 1.09 | 0.25 | 0.533 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0 | 13.5 | 4.05 | logical | logical | 100 | 0.5 | logical | logical | 39.5 | 67 | 1.75 | 6 | 1.1 | logical | logical | 0.44979 | 0.177082677165355 | 5.5 | 7e-05 | logical | 3e-06 | 0.0044 | 8.4e-06 | 0.019 | 0.00045 | 0.00093 | 0.0069 | 8e-04 | 7.3e-05 | 2e-06 | 4.7e-06 | 0.00195 | 0.00082 | 0.135 | 0.194 | 9e-04 | 0.12 | 0.3185 | 0.0042 | 0.104 | 0.00167 | 8.45e-05 | 6.5e-06 | 6.35e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 42 | 15 | 60 | logical | 7 | 6 |
| Mean | 668.407142857143 | 13.3945006313131 | 4.87399004267425 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 19.7731428571429 | 9.98555555555556 | 3.72588888888889 | 0.173666666666667 | 0.721 | 1.09 | 0.25 | 0.533 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0.0105263157894737 | 0.0264565217391304 | 27.9578118686869 | 12.6564831460674 | 1500 | 1500 | 100 | 2.93333333333333 | 1500 | 1500 | 30.7285714285714 | 56.0555555555556 | 2.82298913043478 | 7.56324050632911 | 1.1 | 1500 | 1500 | 1.12053058111111 | 0.440933823928259 | 10.0671641791045 | 0.000303926086956522 | 1500 | 1.29393333333333e-05 | 0.00689818181818182 | 8.4e-06 | 0.024971487804878 | 0.000605 | 0.00111858823529412 | 0.008555625 | 0.0112242105263158 | 0.000110858823529412 | 1.42272727272727e-06 | 4.7e-06 | 0.00267827857142857 | 0.00082 | 0.16921 | 0.328764615384615 | 0.0144 | 0.203958235294118 | 0.377666666666667 | 0.00454708108108108 | 0.106559523809524 | 0.00158142857142857 | 8.45e-05 | 6.5e-06 | 6.35e-06 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 36.885 | 15.6666666666667 | 57 | 1500 | 7.94074074074074 | 7.63111111111111 |
| 3rd Qu. | 974 | 20 | 6.5 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 29 | 12.7 | 3.2 | 0.2205 | 0.721 | 1.09 | 0.25 | 0.5495 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 0 | 0.002625 | 55 | 14.7 | logical | logical | 100 | 4.4 | logical | logical | 42.85 | 69.8 | 3.5 | 10.675 | 1.1 | logical | logical | 1.1938 | 0.47 | 13 | 0.0005975 | logical | 5.5e-06 | 0.0097 | 9.95e-06 | 0.03 | 0.0009625 | 0.00127 | 0.01405 | 0.001235 | 0.00017 | 2.245e-06 | 6.1e-06 | 0.005075 | 0.00082 | 0.2535 | 0.367 | 0.02145 | 0.1985 | 0.434 | 0.00771 | 0.13 | 0.00225 | 0.00010875 | 6.5e-06 | 8.175e-06 | 1.44e-06 | logical | logical | 1e-05 | logical | logical | logical | 52.25 | 17.5 | 70 | logical | 15 | 16 |
| Max. | 2900 | 100 | 57 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 75 | 46.2 | 12.4 | 0.34 | 0.721 | 1.09 | 0.25 | 0.566 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 1500 | 0.1 | 0.43 | 100 | 100 | 1500 | 1500 | 100 | 8.3 | 1500 | 1500 | 71 | 70 | 46.7 | 61 | 1.1 | 1500 | 1500 | 102 | 40 | 50 | 0.001346 | 1500 | 1e-04 | 0.032 | 1.15e-05 | 0.217 | 0.0013 | 0.0066 | 0.016 | 0.2 | 0.000237 | 2.5e-06 | 7.5e-06 | 0.006 | 0.00082 | 0.372 | 1.43 | 0.042 | 1 | 1.155 | 0.0137 | 0.333 | 0.0026 | 0.000133 | 6.5e-06 | 1e-05 | 1.44e-06 | 1500 | 1500 | 1e-05 | 1500 | 1500 | 1500 | 80 | 20 | 81 | 1500 | 28 | 28 |
| NA’s | 1486 | 708 | 797 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 1465 | 1464 | 1491 | 1497 | 1499 | 1499 | 1499 | 1498 | logical | logical | logical | logical | logical | logical | logical | logical | logical | logical | 1481 | 1477 | 708 | 788 | logical | logical | 1499 | 1497 | logical | logical | 1493 | 1491 | 994 | 710 | 1499 | logical | logical | 780 | 780 | 1433 | 1477 | logical | 1485 | 1478 | 1498 | 1459 | 1478 | 1483 | 1484 | 1481 | 1483 | 1489 | 1498 | 1486 | 1499 | 1497 | 1487 | 1497 | 1449 | 1488 | 1463 | 1479 | 1493 | 1498 | 1499 | 1498 | 1499 | logical | logical | 1499 | logical | logical | logical | 1470 | 1497 | 1491 | logical | 825 | 825 |
Take a look at the results before moving on. Anything noteworthy about the nutrition data
Replacing missing values
Unfortunately, the summary revealed that the nutrition data are mostly NA values. After consulting with the lab technician, it appears that much of the data is missing because the food just doesn’t have those nutrients.
But all is not lost! The lab tech also said that for sugar content, zero values are sometimes entered explicitly, but sometimes the values are just left empty to denote a zero. A statistical miracle!
In this exercise, you’ll replace all NA values with zeroes in the sugars_100g column and make histograms to visualize the result. Then, you will exclude the observations which have no sugar to see how the distribution changes.
# Find indices of sugar NA values: missing
missing <- is.na(food3$sugars_100g)
# Replace NA values with 0
food3$sugars_100g[missing] <- 0
# Create first histogram
hist(food3$sugars_100g, breaks = 100)# Create food4
food4 <- food3[food3$sugars_100g > 0, ]
# Create second histogram
hist(food4$sugars_100g, breaks = 100)Excluding the observations which don’t contain any sugar, you can better visualize what the underlying distribution looks like. And now, for something completely different.
Dealing with messy data
Your analysis of sugar content was so impressive that you’ve now been tasked with determining how many of these foods come in some sort of plastic packaging. (No good deed goes unpunished, as they say.)
Your dataset has information about packaging, but there’s a bit of a problem: it’s stored in several different languages (Spanish, French, and English). This takes messy data to a whole new level! There is no R package to selectively translate, but what if you could just work with the messy data directly?
You’re in luck! The root word for plastic is same in English (plastic), French (plastique), and Spanish (plastico). To get a general idea of how many of these foods are packaged in plastic, you can look through the packaging column for the string “plasti”.
# Find entries containing "plasti": plastic
plastic <- str_detect(food3$packaging, "plasti")
# Print the sum of plastic
sum(plastic)## [1] 232